ZSEQ: an interactive DNA sequence analysis program designed for microcomputers.

نویسنده

  • M J Bishop
چکیده

ZSEQ is a self-documenting interactive DNA sequence analysis package designed for small computers based on 280 microprocessors. The SZEQ philosophy is to provide basic operations which can be economically performed on small machines. ZSEQ is written in BCPL, a language which has been carefully designed to be near optimal for flexible text-processing applications (Richards & WhitbyStrevens, 1979). ZSEQ provides counting, listing, translating, filing, splicing and pattern-finding facilities. ZSEQ does not provide searches for genes, secondary structure determinations nor comparisons of two or more sequences, activities which are best carried out on larger machines.% The only sequence length limitation on the programs is the value of the maximum integer held in one word of the BCPL implementation. (On the Z80 this is 32766.) ZSEQ is designed to work on input files in EMBL format and the program should be used in conjunction with the EMBL Nucleotide Sequence Data Library User Manual (‘EMBL manual’) (Cameron et al., 1983). Each entry in the EMBL database corresponds to a single sequence, and an entry is structured so that it can be easily read by humans or machines. Each entry is composed of lines, the first two symbols in a line indicating its type. An entry is identified by its ID line which carries a unique eight-letter identifier. Information lines follow, particularly notable being the FT lines which locate the features of importance such as exons and introns. The start of the sequence itself is after the SQ line and the entry is terminated by the // line. Input files to ZSEQ may contain multiple ID entries, so that automatic processing of many sequences is made possible. The option is also provided to select individual ID entries from a file containing many sequences. ZSEQ will also work on an input file containing a sequence alone, or will extract multiple sequences from input files in SEQ or GENBANK formats. It will not convert the GENBANK sequence positions into an EMBL features table. ZSEQ always verifies the symbols present in sequences, and has facilities for alteration of unrecognized symbols to the set defined in Appendix B.2 of the EMBL Manual. The symbol # may be used to pad aligned sequences [note that the symbol (hyphen) cannot be used for this purpose as it means any of A, C, G, or TI. The output from ZSEQ will be either sequences in EMBL format suitable for storage on disc for further processing or output suitable for printing. ZSEQ is divided by function into four activities: (1) Counting: counts may be made of bases, doublets, triplets, oligonucleotides of up to six in length, and codons. (2) Filing: output may be sent to disk for further processing. Formatting, splicing, complementation, reverse complementation and translation may be performed. (3) Listing: sequences may be listed with position numbers as single or double-stranded DNA or translated in up to six reading frames. (4) Patternfinding: partial or total matches to an input pattern may be found in one or both strands. When running in automatic mode, ZSEQ will process each ID entry in a file from position 1 to the // line. This allows rapid processing with minimal effort. In manual mode each individual ID entry may be selected, and then parts of the sequence may be further selected by position

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DNA sequence analysis on the IBM-PC

We have developed, for the IBM-PC microcomputer, a menu driven, interactive set of programs which provide the functions routinely used for DNA sequence data analyses.

متن کامل

A comprehensive sequence analysis program for the IBM personal computer

We have developed a versatile program for the analysis of nucleic acid and protein sequences on the IBM Personal Computer. The program is interactive and self-instructing. It contains all the features generally found in sequence analysis programs on large computers, including extensive homology routines, as well as new procedures for the entry of sequence data. The program contains facilities t...

متن کامل

A convenient and adaptable microcomputer environment for DNA and protein sequence manipulation and analysis

We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version o...

متن کامل

Zseq: An Approach for Preprocessing Next-Generation Sequencing Data

Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, se...

متن کامل

Portable microcomputer software for nucleotide sequence analysis.

The most common types of nucleotide sequence data analyses and handling can be done more conveniently and inexpensively on microcomputers than on large time-sharing systems. We present a package of computer programs for the analysis of DNA and RNA sequence data which overcomes many of the limitations imposed by microcomputers, while offering most of the features of programs commonly available o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Biochemical Society transactions

دوره 12 6  شماره 

صفحات  -

تاریخ انتشار 1984